Deep Dream from Scratch with PyTorch¶

Understanding How Neural Networks "See" Through Feature Visualization


DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like appearance reminiscent of a psychedelic experience in the deliberately overprocessed images. [https://en.wikipedia.org/wiki/DeepDream]

Deep Dream was introduced in 2015 and quickly became famous for its surreal, psychedelic images filled with swirling patterns, eyes, and dog faces. But beyond the trippy visuals, Deep Dream teaches us something fundamental: how neural networks perceive and represent visual information.

In this tutorial, we'll build Deep Dream from scratch using PyTorch. By the end, you'll understand:

  • How convolutional neural networks detect features at different layers
  • How to use gradients to modify an image (instead of modifying model weights)
  • Why Deep Dream images look the way they do
  • How this connects to modern AI interpretability research

The Core Idea:¶

Normal use of a trained network:

Image goes in → network detects features layer by layer (edges → shapes → textures → objects) → outputs a label like "dog" The image is just read, never changed.

Deep Dream flips the goal:

Instead of asking "what's in this image?", you ask: "Change this image so the patterns you're detecting become stronger."

Here's the actual process:

  1. You feed your image into the pre-trained network
  2. You pick a layer in the network (each layer detects different things—early layers see edges, deeper layers see complex objects like faces or animals)
  3. The network looks at that layer's activations (how strongly it's detecting patterns)
  4. Now the key part: you calculate how to tweak each pixel in your original image to make those activations stronger
  5. You slightly modify the image pixels accordingly
  6. Repeat steps 1–5 many times

Each iteration, the image gets nudged to contain "more" of whatever that layer is detecting. After many loops, faint patterns snowball into vivid, exaggerated features.

We don't train a model.

We take a pre-trained model and freeze it.

We forward pass an image through the model and capture the activations at a chosen layer.

Then we compute gradients of those activations with respect to the input image pixels.

Finally, we add these gradients back to the image—this amplifies whatever patterns the layer detected. Repeat this process, and the patterns snowball into the trippy Deep Dream effect.

1. Setup and Imports¶

We'll use a pre-trained InceptionV3 model. This is the same architecture Google used in the original Deep Dream.

In [1]:
# Install dependencies if needed
!pip install torch torchvision pillow matplotlib numpy
Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (2.9.0+cu126)
Requirement already satisfied: torchvision in /usr/local/lib/python3.12/dist-packages (0.24.0+cu126)
Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (11.3.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.12/dist-packages (3.10.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from torch) (3.20.0)
Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.12/dist-packages (from torch) (4.15.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch) (75.2.0)
Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch) (3.6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in /usr/local/lib/python3.12/dist-packages (from torch) (2025.3.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.80)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch) (11.3.0.4)
Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch) (10.3.7.77)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch) (11.7.1.2)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch) (12.5.4.2)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch) (2.27.5)
Requirement already satisfied: nvidia-nvshmem-cu12==3.3.20 in /usr/local/lib/python3.12/dist-packages (from torch) (3.3.20)
Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.85)
Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch) (1.11.1.6)
Requirement already satisfied: triton==3.5.0 in /usr/local/lib/python3.12/dist-packages (from torch) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (4.61.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (1.4.9)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (25.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (3.2.5)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch) (3.0.3)
In [2]:
import torch
import torch.nn as nn
from torchvision import models, transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from typing import List, Tuple, Optional
import requests
from io import BytesIO

# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
Using device: cuda

2. Understanding the Core Idea¶

In normal neural network usage:

  • Input image → Network → Output prediction
  • We adjust the network weights to improve predictions

In Deep Dream:

  • Input image → Network → Layer activations
  • We adjust the input image pixels to amplify those activations

The key insight: we compute gradients with respect to the input image, not the weights. Then we add those gradients to the image to make the detected patterns stronger.

3. Load the Pre-trained Model¶

We'll use InceptionV3 and create a wrapper that lets us extract activations from any layer.

In [3]:
class DeepDreamModel(nn.Module):
    """
    Wrapper around InceptionV3 that captures intermediate layer activations.
    """
    def __init__(self):
        super().__init__()
        # Load pre-trained InceptionV3
        self.inception = models.inception_v3(weights=models.Inception_V3_Weights.DEFAULT)
        self.inception.eval()

        # Freeze all parameters
        for param in self.inception.parameters():
            param.requires_grad = False

        # Dictionary to store layer outputs
        self.layer_outputs = {}
        self.hooks = []

        # Register hooks on the actual module structure
        self._register_hooks()

    def _register_hooks(self):
        """Register forward hooks on layers we want to visualize."""

        # Get actual layer references from the model
        target_layers = {
            'Conv2d_1a': self.inception.Conv2d_1a_3x3,
            'Conv2d_2b': self.inception.Conv2d_2b_3x3,
            'Mixed_5b': self.inception.Mixed_5b,
            'Mixed_5c': self.inception.Mixed_5c,
            'Mixed_5d': self.inception.Mixed_5d,
            'Mixed_6a': self.inception.Mixed_6a,
            'Mixed_6b': self.inception.Mixed_6b,
            'Mixed_6c': self.inception.Mixed_6c,
            'Mixed_6d': self.inception.Mixed_6d,
            'Mixed_6e': self.inception.Mixed_6e,
            'Mixed_7a': self.inception.Mixed_7a,
            'Mixed_7b': self.inception.Mixed_7b,
            'Mixed_7c': self.inception.Mixed_7c,
        }

        for name, layer in target_layers.items():
            hook = layer.register_forward_hook(self._make_hook(name))
            self.hooks.append(hook)

        # Store the layer names so we can report them
        self._layer_names = list(target_layers.keys())

    def _make_hook(self, name):
        """Create a hook function that saves the layer output."""
        def hook(module, input, output):
            self.layer_outputs[name] = output
        return hook

    def forward(self, x):
        """Forward pass - this populates self.layer_outputs."""
        self.layer_outputs = {}
        _ = self.inception(x)
        return self.layer_outputs

    def get_available_layers(self):
        """Return list of layers we can dream on."""
        return self._layer_names


# Initialize the model
model = DeepDreamModel().to(device)

# Run a dummy forward pass to verify hooks work
dummy = torch.randn(1, 3, 299, 299).to(device)
_ = model(dummy)
print(f"Model loaded!")
print(f"Available layers for dreaming: {model.get_available_layers()}")
print(f"Hooks captured {len(model.layer_outputs)} layers")
Downloading: "https://download.pytorch.org/models/inception_v3_google-0cc3c7bd.pth" to /root/.cache/torch/hub/checkpoints/inception_v3_google-0cc3c7bd.pth
100%|██████████| 104M/104M [00:00<00:00, 177MB/s] 
Model loaded!
Available layers for dreaming: ['Conv2d_1a', 'Conv2d_2b', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d', 'Mixed_6e', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c']
Hooks captured 13 layers

What is hook?

A hook is a way to "spy on" what's happening inside a neural network during the forward pass. Normally, when you run a model:

Input → Layer1 → Layer2 → Layer3 → ... → Output
                    ↑
            (intermediate values are computed
            but you never see them)

With a hook, you can intercept those intermediate values:

Input → Layer1 → Layer2 → Layer3 → ... → Output
                    │
                    ├──→ Hook captures this!
                    │    (saved to a variable)

Why we need hooks for Deep Dream:

  • We need the activations from a middle layer (not the final output)
  • We want to compute gradients with respect to those activations
  • Hooks let us "reach inside" and grab those values

Let's look at the model:

In [4]:
model
Out[4]:
DeepDreamModel(
  (inception): Inception3(
    (Conv2d_1a_3x3): BasicConv2d(
      (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
      (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (Conv2d_2a_3x3): BasicConv2d(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (Conv2d_2b_3x3): BasicConv2d(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (maxpool1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (Conv2d_3b_1x1): BasicConv2d(
      (conv): Conv2d(64, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(80, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (Conv2d_4a_3x3): BasicConv2d(
      (conv): Conv2d(80, 192, kernel_size=(3, 3), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (maxpool2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (Mixed_5b): InceptionA(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch5x5_1): BasicConv2d(
        (conv): Conv2d(192, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch5x5_2): BasicConv2d(
        (conv): Conv2d(48, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_1): BasicConv2d(
        (conv): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_2): BasicConv2d(
        (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3): BasicConv2d(
        (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_5c): InceptionA(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch5x5_1): BasicConv2d(
        (conv): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch5x5_2): BasicConv2d(
        (conv): Conv2d(48, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_1): BasicConv2d(
        (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_2): BasicConv2d(
        (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3): BasicConv2d(
        (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_5d): InceptionA(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch5x5_1): BasicConv2d(
        (conv): Conv2d(288, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch5x5_2): BasicConv2d(
        (conv): Conv2d(48, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_1): BasicConv2d(
        (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_2): BasicConv2d(
        (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3): BasicConv2d(
        (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_6a): InceptionB(
      (branch3x3): BasicConv2d(
        (conv): Conv2d(288, 384, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_1): BasicConv2d(
        (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_2): BasicConv2d(
        (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3): BasicConv2d(
        (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_6b): InceptionC(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_1): BasicConv2d(
        (conv): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_2): BasicConv2d(
        (conv): Conv2d(128, 128, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_3): BasicConv2d(
        (conv): Conv2d(128, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_1): BasicConv2d(
        (conv): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_2): BasicConv2d(
        (conv): Conv2d(128, 128, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_3): BasicConv2d(
        (conv): Conv2d(128, 128, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_4): BasicConv2d(
        (conv): Conv2d(128, 128, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_5): BasicConv2d(
        (conv): Conv2d(128, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_6c): InceptionC(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_1): BasicConv2d(
        (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_2): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_3): BasicConv2d(
        (conv): Conv2d(160, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_1): BasicConv2d(
        (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_2): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_3): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_4): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_5): BasicConv2d(
        (conv): Conv2d(160, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_6d): InceptionC(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_1): BasicConv2d(
        (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_2): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_3): BasicConv2d(
        (conv): Conv2d(160, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_1): BasicConv2d(
        (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_2): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_3): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_4): BasicConv2d(
        (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_5): BasicConv2d(
        (conv): Conv2d(160, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_6e): InceptionC(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_2): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7_3): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_2): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_3): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_4): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7dbl_5): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (AuxLogits): InceptionAux(
      (conv0): BasicConv2d(
        (conv): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (conv1): BasicConv2d(
        (conv): Conv2d(128, 768, kernel_size=(5, 5), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (fc): Linear(in_features=768, out_features=1000, bias=True)
    )
    (Mixed_7a): InceptionD(
      (branch3x3_1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3_2): BasicConv2d(
        (conv): Conv2d(192, 320, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (bn): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7x3_1): BasicConv2d(
        (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7x3_2): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7x3_3): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch7x7x3_4): BasicConv2d(
        (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_7b): InceptionE(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(1280, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3_1): BasicConv2d(
        (conv): Conv2d(1280, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3_2a): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3_2b): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_1): BasicConv2d(
        (conv): Conv2d(1280, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_2): BasicConv2d(
        (conv): Conv2d(448, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3a): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3b): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(1280, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (Mixed_7c): InceptionE(
      (branch1x1): BasicConv2d(
        (conv): Conv2d(2048, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3_1): BasicConv2d(
        (conv): Conv2d(2048, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3_2a): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3_2b): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_1): BasicConv2d(
        (conv): Conv2d(2048, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_2): BasicConv2d(
        (conv): Conv2d(448, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3a): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch3x3dbl_3b): BasicConv2d(
        (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)
        (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
      (branch_pool): BasicConv2d(
        (conv): Conv2d(2048, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
    (dropout): Dropout(p=0.5, inplace=False)
    (fc): Linear(in_features=2048, out_features=1000, bias=True)
  )
)

Mixed_5c is not a single layer—it's an Inception module, which is a mini-network with multiple parallel branches. Here's what's happening inside Mixed_5c:

                      Input
                        │
        ┌──────────────┼──────────────┬──────────────┐
        │              │              │              │
        ▼              ▼              ▼              ▼
    branch1x1     branch5x5      branch3x3dbl   branch_pool
      (1x1 conv)   (1x1 → 5x5)   (1x1 → 3x3 → 3x3)  (pool → 1x1)
        │              │              │              │
        └──────────────┴──────────────┴──────────────┘
                        │
                    Concatenate
                        │
                    Output

The key insight of the Inception architecture is: we don't know which filter size is best for detecting a feature, so let's use multiple sizes in parallel.

branch1x1: Captures fine-grained, pixel-level features
branch5x5: Captures medium-sized patterns (done as 1x1 → 5x5 to reduce computation)
branch3x3dbl: Captures larger patterns via stacked 3x3 convolutions
branch_pool: Preserves spatial information through pooling

All four branches process the same input, then their outputs are concatenated along the channel dimension.

4. Image Preprocessing¶

We need to:

  1. Normalize the image for InceptionV3 (it expects specific mean/std)
  2. Be able to convert back for display
In [5]:
# ImageNet normalization values
IMAGENET_MEAN = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1).to(device)
IMAGENET_STD = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1).to(device)


def load_image(path_or_url: str, max_size: int = 512) -> torch.Tensor:
    """
    Load an image from a file path or URL.
    Returns a normalized tensor ready for the model.
    """
    # Load image
    if path_or_url.startswith('http'):
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}
        response = requests.get(path_or_url, headers=headers, allow_redirects=True)
        response.raise_for_status()
        img = Image.open(BytesIO(response.content)).convert('RGB')
    else:
        img = Image.open(path_or_url).convert('RGB')

    # Resize while maintaining aspect ratio
    ratio = max_size / max(img.size)
    new_size = tuple(int(dim * ratio) for dim in img.size)
    img = img.resize(new_size, Image.LANCZOS)

    # Convert to tensor [0, 1]
    img_tensor = transforms.ToTensor()(img).unsqueeze(0).to(device)

    # Normalize for InceptionV3
    img_tensor = (img_tensor - IMAGENET_MEAN) / IMAGENET_STD

    return img_tensor

def tensor_to_image(tensor: torch.Tensor) -> np.ndarray:
    """
    Convert a normalized tensor back to a displayable image.
    """
    # Denormalize
    img = tensor * IMAGENET_STD + IMAGENET_MEAN

    # Clip to valid range and convert
    img = img.squeeze(0).permute(1, 2, 0).cpu().detach().numpy()
    img = np.clip(img, 0, 1)

    return img


def show_image(tensor: torch.Tensor, title: str = ""):
    """Display a tensor as an image."""
    img = tensor_to_image(tensor)
    plt.figure(figsize=(10, 10))
    plt.imshow(img)
    plt.title(title)
    plt.axis('off')
    plt.show()

5. The Core Deep Dream Algorithm¶

Here's where the magic happens. The algorithm is surprisingly simple:

for each iteration:
    1. Forward pass: get layer activations
    2. Compute loss: mean of activations (we want to maximize this)
    3. Backward pass: compute gradients with respect to input image
    4. Update image: add gradients to amplify detected patterns
In [6]:
def deep_dream_step(
    model: DeepDreamModel,
    image: torch.Tensor,
    layer_name: str,
    learning_rate: float = 0.01
) -> torch.Tensor:
    """
    Perform a single Deep Dream optimization step.

    Args:
        model: The DeepDream model wrapper
        image: Input image tensor (requires_grad must be True)
        layer_name: Which layer to maximize activations for
        learning_rate: Step size for gradient ascent

    Returns:
        Updated image tensor
    """
    # Forward pass
    layer_outputs = model(image)

    # Get the activations from our target layer
    activations = layer_outputs[layer_name]

    # Loss = mean of activations
    # We want to MAXIMIZE this, so we'll do gradient ASCENT
    loss = activations.mean()

    # Backward pass - compute gradients w.r.t. image
    loss.backward()

    # Normalize gradients (helps stabilize the process)
    grad = image.grad.data
    grad = grad / (grad.std() + 1e-8)  # Normalize by standard deviation

    # Gradient ASCENT (add gradients, not subtract)
    image.data = image.data + learning_rate * grad

    # Clear gradients for next iteration
    image.grad.data.zero_()

    return image


def deep_dream(
    model: DeepDreamModel,
    image: torch.Tensor,
    layer_name: str,
    iterations: int = 20,
    learning_rate: float = 0.01,
    show_progress: bool = True
) -> torch.Tensor:
    """
    Run the Deep Dream algorithm.

    Args:
        model: The DeepDream model wrapper
        image: Input image tensor
        layer_name: Which layer to dream on
        iterations: Number of optimization steps
        learning_rate: Step size
        show_progress: Whether to print progress

    Returns:
        Dreamed image tensor
    """
    # Clone image and enable gradients
    dream_image = image.clone().requires_grad_(True)

    for i in range(iterations):
        dream_image = deep_dream_step(model, dream_image, layer_name, learning_rate)

        if show_progress and (i + 1) % 5 == 0:
            print(f"Iteration {i + 1}/{iterations}")

    return dream_image.detach()


print("Deep Dream functions defined!")
Deep Dream functions defined!

Details of Deep Dream¶

1. Gradient Ascent vs Gradient Descent

Normal training (Gradient Descent):

Goal: minimize loss (make predictions better)
Update: weights = weights - learning_rate * gradient
We go downhill on the loss landscape

Deep Dream (Gradient Ascent):

Goal: maximize activations (make patterns stronger)
Update: image = image + learning_rate * gradient
We go uphill on the activation landscape

Loss Landscape:

          ╱╲
          ╱  ╲
        ╱    ╲      Descent: go DOWN to find minimum
        ╱      ╲     (subtract gradient)
      ╱   ↓    ╲
          minimum

Activation Landscape:

        maximum
            ↑      Ascent: go UP to find maximum
      ╱    ↑    ╲ (add gradient)
      ╱          ╲
    ╱            ╲

2. Gradient Normalization

In normal training:

We don't usually normalize gradients this way. We just use them directly (or with optimizers like Adam that do their own scaling).

In Deep Dream:

# Normalize gradients (helps stabilize the process)
grad = image.grad.data
grad = grad / (grad.std() + 1e-8)  # Normalize by standard deviation

We normalize because:

Gradients can vary wildly in magnitude across different images/layers Without normalization, some iterations might make huge changes, others tiny Dividing by std() makes each step roughly the same size

3. Image.grad?

# Before backward():
image.data = [[0.5, 0.3], [0.8, 0.2]]  # pixel values
image.grad = None                       # no gradients yet

# After loss.backward():
image.data = [[0.5, 0.3], [0.8, 0.2]]  # unchanged
image.grad = [[0.02, -0.01], [0.05, 0.03]]  # now populated!

# After gradient ascent update:
image.data = [[0.52, 0.29], [0.85, 0.23]]  # modified!
image.grad = [[0.02, -0.01], [0.05, 0.03]]  # still there

# After zero_():
image.data = [[0.52, 0.29], [0.85, 0.23]]  # unchanged
image.grad = [[0.0, 0.0], [0.0, 0.0]]      # cleared for next iteration

6. Let's Dream!¶

Now let's try it out. We'll use a sample image and see what different layers produce.

In [7]:
# Sky with clouds (patterns emerge beautifully)
sample_url = "https://images.unsplash.com/photo-1534088568595-a066f410bcda?w=600"

original_image = load_image(sample_url, max_size=400)
show_image(original_image, "Original Image")
No description has been provided for this image
In [8]:
# Dream on an early layer (edges and simple patterns)
early_dream = deep_dream(
    model,
    original_image,
    layer_name='Mixed_5b',  # Early-mid layer
    iterations=30,
    learning_rate=0.01
)
show_image(early_dream, "Deep Dream - Mixed_5b (Early Layer: Textures & Patterns)")
Iteration 5/30
Iteration 10/30
Iteration 15/30
Iteration 20/30
Iteration 25/30
Iteration 30/30
No description has been provided for this image
In [9]:
# Dream on a deeper layer (object parts and complex features)
deep_dream_result = deep_dream(
    model,
    original_image,
    layer_name='Mixed_6c',  # Deeper layer
    iterations=30,
    learning_rate=0.01
)
show_image(deep_dream_result, "Deep Dream - Mixed_6c (Deep Layer: Object Parts)")
Iteration 5/30
Iteration 10/30
Iteration 15/30
Iteration 20/30
Iteration 25/30
Iteration 30/30
No description has been provided for this image
In [10]:
# Dream on the deepest layer (full objects - eyes, faces, animals)
deepest_dream = deep_dream(
    model,
    original_image,
    layer_name='Mixed_7c',  # Deepest layer
    iterations=30,
    learning_rate=0.01
)
show_image(deepest_dream, "Deep Dream - Mixed_7c (Deepest Layer: Complex Objects)")
Iteration 5/30
Iteration 10/30
Iteration 15/30
Iteration 20/30
Iteration 25/30
Iteration 30/30
No description has been provided for this image

7. Multi-Scale Deep Dream (Octaves)¶

The basic algorithm works, but the results can look noisy. A common improvement is multi-scale processing (also called "octaves"):

  1. Start with a small version of the image
  2. Dream on it
  3. Upscale the result
  4. Add details from the original image
  5. Dream again
  6. Repeat

This creates more coherent, visually appealing results.

1. Octave

Process at multiple scales (small → large), like looking at an image from far away, then closer, then closer.

num_octaves = 4
octave_scale = 1.4
original image = 512x512


Octave sizes (small to large):
  octave 0: 512 / 1.4³ = 186x186  (far away - big patterns)
  octave 1: 512 / 1.4² = 261x261  
  octave 2: 512 / 1.4¹ = 365x365  
  octave 3: 512 / 1.4⁰ = 512x512  (close up - fine details)

Why?

  • Small scale → network sees the whole image → creates large, coherent patterns
  • Large scale → network sees details → adds fine texture

2. What is interpolate? Why do we need it?

interpolate = resize an image

  • Original: 512x512

  • We want: 186x186

    scaled_image = torch.nn.functional.interpolate(
        original,           # input tensor
        size=(186, 186),    # target size
        mode='bilinear',    # smooth resizing (not blocky)
        align_corners=False
    )

Visual example:

  ```
  Original (512x512):          Interpolated (186x186):
  ┌─────────────────┐          ┌───────┐
  │                 │          │       │
  │      🐱         │   ──→    │  🐱   │
  │                 │          │       │
  │                 │          └───────┘
  └─────────────────┘          (same image, smaller)

Why needed? The model can only dream at one size at a time. We resize to process at different scales.

5. Why extract detail?

We want to accumulate only the new patterns, not the original image: python

  detail = dreamed - scaled_image

  #        ↑         ↑
  #        output    input (without previous detail)
  #        
  #        = only the NEW patterns added by this octave

Visual example:

scaled_image:     dreamed:           detail:
┌───────┐        ┌───────┐          ┌───────┐
│       │        │ ~∿∿~  │          │ ~∿∿~  │
│  🐱   │   +    │  🐱   │    =     │       │  (just the swirls)
│       │  dream │ ~∿∿~  │    -     │ ~∿∿~  │
└───────┘        └───────┘          └───────┘

Why? So we can upscale just the patterns and add more detail in the next octave, without duplicating the base image.

In [11]:
def deep_dream_octaves(
    model: DeepDreamModel,
    image: torch.Tensor,
    layer_name: str,
    num_octaves: int = 4,
    octave_scale: float = 1.4,
    iterations_per_octave: int = 100,
    learning_rate: float = 0.01,
    show_progress: bool = True
) -> torch.Tensor:
    """
    Multi-scale Deep Dream for better quality results.

    Args:
        model: The DeepDream model wrapper
        image: Input image tensor
        layer_name: Which layer to dream on
        num_octaves: Number of scales to process
        octave_scale: Scale factor between octaves
        iterations_per_octave: Optimization steps per scale
        learning_rate: Step size
        show_progress: Whether to print progress

    Returns:
        Dreamed image tensor
    """
    original = image.clone()

    # Calculate sizes for each octave (from smallest to largest)
    _, _, h, w = image.shape
    octave_sizes = []
    for i in range(num_octaves - 1, -1, -1):
        scale = octave_scale ** i
        octave_sizes.append((int(h / scale), int(w / scale)))

    # Start with zeros (detail will be added)
    detail = torch.zeros_like(image)

    for octave_idx, (oh, ow) in enumerate(octave_sizes):
        if show_progress:
            print(f"\nOctave {octave_idx + 1}/{num_octaves} - Size: {ow}x{oh}")

        # Resize image and detail to current octave size
        scaled_image = torch.nn.functional.interpolate(
            original, size=(oh, ow), mode='bilinear', align_corners=False
        )
        scaled_detail = torch.nn.functional.interpolate(
            detail, size=(oh, ow), mode='bilinear', align_corners=False
        )

        # Add detail from previous octave
        dream_input = scaled_image + scaled_detail

        # Dream at this scale
        dreamed = deep_dream(
            model,
            dream_input,
            layer_name,
            iterations=iterations_per_octave,
            learning_rate=learning_rate,
            show_progress=False
        )

        # Extract the detail (difference between dreamed and input)
        detail = dreamed - scaled_image

        # Upscale detail for next octave
        if octave_idx < len(octave_sizes) - 1:
            next_h, next_w = octave_sizes[octave_idx + 1]
            detail = torch.nn.functional.interpolate(
                detail, size=(next_h, next_w), mode='bilinear', align_corners=False
            )

    # Final result: original + accumulated detail
    final = original + torch.nn.functional.interpolate(
        detail, size=(h, w), mode='bilinear', align_corners=False
    )

    return final


print("Multi-scale Deep Dream defined!")
Multi-scale Deep Dream defined!

Full Flow Visualization¶

Octave 1 (186x186):
  scaled_image ──────────────────┐
                                 ├──→ dream_input ──→ [DREAM] ──→ dreamed
  detail (zeros) ────────────────┘                                   │
                                                                     │
                              detail = dreamed - scaled_image ←──────┘
                                         │
                                    [UPSCALE to 261x261]
                                         │
                                         ▼
Octave 2 (261x261):
  scaled_image ──────────────────┐
                                 ├──→ dream_input ──→ [DREAM] ──→ dreamed
  detail (from octave 1) ────────┘                                   │
                                                                     │
                              detail = dreamed - scaled_image ←──────┘
                                         │
                                    [UPSCALE to 365x365]
                                         │
                                         ▼
                                       ... continue ...
                                         │
                                         ▼
Final: original + detail (upscaled to 512x512)
In [12]:
# Try the multi-scale version
octave_dream = deep_dream_octaves(
    model,
    original_image,
    layer_name='Mixed_6c',
    num_octaves=4,
    octave_scale=1.4,
    iterations_per_octave=30,
    learning_rate=0.01
)

show_image(octave_dream, "Multi-Scale Deep Dream (Mixed_6c)")
Octave 1/4 - Size: 112x145

Octave 2/4 - Size: 158x204

Octave 3/4 - Size: 221x285

Octave 4/4 - Size: 310x400
No description has been provided for this image

8. Compare Different Layers¶

Let's visualize what different layers "see" by dreaming on multiple layers.

In [13]:
def compare_layers(model, image, layers, **kwargs):
    """Dream on multiple layers and display results side by side."""
    n_layers = len(layers)
    fig, axes = plt.subplots(1, n_layers + 1, figsize=(5 * (n_layers + 1), 5))

    # Show original
    axes[0].imshow(tensor_to_image(image))
    axes[0].set_title("Original")
    axes[0].axis('off')

    # Dream on each layer
    for idx, layer in enumerate(layers):
        print(f"\nProcessing layer: {layer}")
        dreamed = deep_dream_octaves(model, image, layer, show_progress=False, **kwargs)
        axes[idx + 1].imshow(tensor_to_image(dreamed))
        axes[idx + 1].set_title(f"{layer}")
        axes[idx + 1].axis('off')

    plt.tight_layout()
    plt.show()


# Compare early vs deep layers
compare_layers(
    model,
    original_image,
    ['Mixed_5b', 'Mixed_6b', 'Mixed_7b'],
    num_octaves=3,
    iterations_per_octave=30
)
Processing layer: Mixed_5b

Processing layer: Mixed_6b

Processing layer: Mixed_7b
No description has been provided for this image

9. Why Do We See Dogs and Eyes?¶

You might notice that Deep Dream often produces animal-like features, eyes, and fur patterns. This isn't random—it's because:

  1. InceptionV3 was trained on ImageNet, which has over 100 dog breeds (out of 1000 total classes)
  2. The network has learned to be very good at detecting animal features
  3. When we amplify activations, we amplify whatever patterns the network knows best

This is actually a profound insight into neural networks: they have biases based on their training data, and those biases become visible through techniques like Deep Dream.

Connection to Modern AI Interpretability¶

Deep Dream was one of the first "feature visualization" techniques. Today, researchers use similar ideas to:

  • Understand what models have learned (interpretability)
  • Find model biases (fairness)
  • Create adversarial examples (security)
  • Generate art (creative AI)

The core principle—using gradients to modify inputs—remains fundamental.

10. Try Your Own Images!¶

Here's a convenient function to dream on any image:

In [14]:
def dream_on_image(
    image_path_or_url: str,
    layer: str = 'Mixed_6c',
    max_size: int = 512,
    num_octaves: int = 4,
    iterations: int = 10,
    learning_rate: float = 0.01
):
    """
    Convenient function to apply Deep Dream to any image.

    Args:
        image_path_or_url: Path to local image or URL
        layer: Which layer to dream on (see model.get_available_layers())
        max_size: Maximum image dimension
        num_octaves: Number of scales (more = slower but better)
        iterations: Steps per octave
        learning_rate: How strong the effect is
    """
    # Load image
    image = load_image(image_path_or_url, max_size=max_size)

    # Show original
    print("Original:")
    show_image(image, "Original")

    # Dream
    dreamed = deep_dream_octaves(
        model, image, layer,
        num_octaves=num_octaves,
        iterations_per_octave=iterations,
        learning_rate=learning_rate
    )

    # Show result
    print(f"\nDreamed ({layer}):")
    show_image(dreamed, f"Deep Dream - {layer}")

    return dreamed


# Example usage:
# result = dream_on_image("your_image.jpg", layer='Mixed_6c')
#
# Or with a URL:
# result = dream_on_image("https://example.com/image.jpg")
In [15]:
model.get_available_layers()
Out[15]:
['Conv2d_1a',
 'Conv2d_2b',
 'Mixed_5b',
 'Mixed_5c',
 'Mixed_5d',
 'Mixed_6a',
 'Mixed_6b',
 'Mixed_6c',
 'Mixed_6d',
 'Mixed_6e',
 'Mixed_7a',
 'Mixed_7b',
 'Mixed_7c']
In [16]:
# Try it with another sample image
nature_url = "https://images.pexels.com/photos/414612/pexels-photo-414612.jpeg"

# Mountain landscape
nature_url = "https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=600"

# Forest
nature_url = "https://images.unsplash.com/photo-1448375240586-882707db888b?w=600"

# Architecture (buildings create interesting geometric patterns)
# nature_url = "https://images.unsplash.com/photo-1486325212027-8081e485255e?w=600"

# Ocean waves
# nature_url = "https://images.unsplash.com/photo-1505118380757-91f5f5632de0?w=600"

# Abstract texture (good for showing pure pattern amplification)
# nature_url = "https://images.unsplash.com/photo-1558591710-4b4a1ae0f04d?w=600"


result = dream_on_image(nature_url, layer='Mixed_6e', num_octaves=4, iterations=30)
Original:
No description has been provided for this image
Octave 1/4 - Size: 186x124

Octave 2/4 - Size: 261x173

Octave 3/4 - Size: 365x243

Octave 4/4 - Size: 512x341

Dreamed (Mixed_6e):
No description has been provided for this image
In [17]:
result = dream_on_image(nature_url, layer='Mixed_7c', num_octaves=4, iterations=30)
Original:
No description has been provided for this image
Octave 1/4 - Size: 186x124

Octave 2/4 - Size: 261x173

Octave 3/4 - Size: 365x243

Octave 4/4 - Size: 512x341

Dreamed (Mixed_7c):
No description has been provided for this image

11. Save Your Creations¶

In [18]:
def save_dream(tensor: torch.Tensor, filename: str):
    """Save a dreamed image to file."""
    img = tensor_to_image(tensor)
    img_pil = Image.fromarray((img * 255).astype(np.uint8))
    img_pil.save(filename)
    print(f"Saved to {filename}")


# Example:
# save_dream(result, "my_dream.png")

Summary¶

We've built Deep Dream from scratch! Here's what we learned:

  1. Neural networks detect hierarchical features - edges → textures → parts → objects
  2. Gradient ascent on inputs - instead of training the network, we modify the image
  3. Multi-scale processing - improves visual quality
  4. Networks have biases - they "see" what they were trained to recognize

Key Takeaways for Understanding AI:¶

  • Feature visualization reveals what networks learn
  • Training data biases become visible in network behavior
  • The same gradient-based principles underpin adversarial examples, style transfer, and more

Next Steps:¶

  • Try different pre-trained models (VGG, ResNet)
  • Experiment with maximizing specific neurons instead of whole layers
  • Combine with style transfer for even more creative results
  • Explore modern interpretability tools like Lucid or TransformerLens

Thanks for reading! If you found this helpful, consider following for more deep learning tutorials.

In [19]:
!jupyter nbconvert --to html https://colab.research.google.com/drive/1EvQjVea0M3qkUVMkil3ZtujnGcRGjBB_?usp=drive_link
[NbConvertApp] WARNING | pattern '/content/deep_dream_tutorial.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
        to various other formats.

        WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.

Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
    Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
    Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
    read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
    Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
    Write notebook output to stdout instead of files.
    Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
    Run nbconvert in place, overwriting the existing notebook (only
            relevant when converting to notebook format)
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
    Clear output of current file and save in place,
            overwriting the existing notebook.
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--coalesce-streams
    Coalesce consecutive stdout and stderr outputs into one stream (within each cell).
    Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --CoalesceStreamsPreprocessor.enabled=True]
--no-prompt
    Exclude input and output prompts from converted document.
    Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
    Exclude input cells and output prompts from converted document.
            This mode is ideal for generating code-free reports.
    Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
    Whether to allow downloading chromium if no suitable version is found on the system.
    Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
    Disable chromium security sandbox when converting to PDF..
    Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
    Shows code input. This flag is only useful for dejavu users.
    Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
    Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
    Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
    Whether the HTML in Markdown cells and cell outputs should be sanitized..
    Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
    Set the log level by value or name.
    Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
    Default: 30
    Equivalent to: [--Application.log_level]
--config=<Unicode>
    Full path of a config file.
    Default: ''
    Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
    The export format to be used, either one of the built-in formats
            ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf']
            or a dotted object name that represents the import path for an
            ``Exporter`` class
    Default: ''
    Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
    Name of the template to use
    Default: ''
    Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
    Name of the template file to use
    Default: None
    Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
    Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
    as prebuilt extension for the lab template)
    Default: 'light'
    Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
    Whether the HTML in Markdown cells and cell outputs should be sanitized.This
    should be set to True by nbviewer or similar tools.
    Default: False
    Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
    Writer class used to write the
                                        results of the conversion
    Default: 'FilesWriter'
    Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
    PostProcessor class used to write the
                                        results of the conversion
    Default: ''
    Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
    Overwrite base name use for output files.
                Supports pattern replacements '{notebook_name}'.
    Default: '{notebook_name}'
    Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
    Directory to write output(s) to. Defaults
                                  to output to the directory of each notebook. To recover
                                  previous default behaviour (outputting to the current
                                  working directory) use . as the flag value.
    Default: ''
    Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
    The URL prefix for reveal.js (version 3.x).
            This defaults to the reveal CDN, but can be any url pointing to a copy
            of reveal.js.
            For speaker notes to work, this must be a relative path to a local
            copy of reveal.js: e.g., "reveal.js".
            If a relative path is given, it must be a subdirectory of the
            current directory (from which the server is run).
            See the usage documentation
            (https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
            for more details.
    Default: ''
    Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
    The nbformat version to write.
            Use this to downgrade notebooks.
    Choices: any of [1, 2, 3, 4]
    Default: 4
    Equivalent to: [--NotebookExporter.nbformat_version]

Examples
--------

    The simplest way to use nbconvert is

            > jupyter nbconvert mynotebook.ipynb --to html

            Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf'].

            > jupyter nbconvert --to latex mynotebook.ipynb

            Both HTML and LaTeX support multiple output templates. LaTeX includes
            'base', 'article' and 'report'.  HTML includes 'basic', 'lab' and
            'classic'. You can specify the flavor of the format used.

            > jupyter nbconvert --to html --template lab mynotebook.ipynb

            You can also pipe the output to stdout, rather than a file

            > jupyter nbconvert mynotebook.ipynb --stdout

            PDF is generated via latex

            > jupyter nbconvert mynotebook.ipynb --to pdf

            You can get (and serve) a Reveal.js-powered slideshow

            > jupyter nbconvert myslides.ipynb --to slides --post serve

            Multiple notebooks can be given at the command line in a couple of
            different ways:

            > jupyter nbconvert notebook*.ipynb
            > jupyter nbconvert notebook1.ipynb notebook2.ipynb

            or you can specify the notebooks list in a config file, containing::

                c.NbConvertApp.notebooks = ["my_notebook.ipynb"]

            > jupyter nbconvert --config mycfg.py

To see all available configurables, use `--help-all`.

In [20]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive